326 research outputs found

    Gaussian mixture gain priors for regularized nonnegative matrix factorization in single-channel source separation

    Get PDF
    We propose a new method to incorporate statistical priors on the solution of the nonnegative matrix factorization (NMF) for single-channel source separation (SCSS) applications. The Gaussian mixture model (GMM) is used as a log-normalized gain prior model for the NMF solution. The normalization makes the prior models energy independent. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The NMF solutions for the weights are encouraged to increase the loglikelihood with the trained gain prior GMMs while reducing the NMF reconstruction error at the same time

    Semi-blind speech-music separation using sparsity and continuity priors

    Get PDF
    In this paper we propose an approach for the problem of single channel source separation of speech and music signals. Our approach is based on representing each source's power spectral density using dictionaries and nonlinearly projecting the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms. We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity priors help improve the performance of the proposed system

    Single channel speech music separation using nonnegative matrix factorization and spectral masks

    Get PDF
    A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with spectral masks is proposed in this work. The proposed algorithm uses training data of speech and music signals with nonnegative matrix factorization followed by masking to separate the mixed signal. In the training stage, NMF uses the training data to train a set of basis vectors for each source. These bases are trained using NMF in the magnitude spectrum domain. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a linear combination of the trained bases for both sources. The decomposition results are used to build a mask, which explains the contribution of each source in the mixed signal. Experimental results show that using masks after NMF improves the separation process even when calculating NMF with fewer iterations, which yields a faster separation process

    Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation

    Get PDF
    We propose a new method to incorporate rich statistical priors, modeling temporal gain sequences in the solutions of nonnegative matrix factorization (NMF). The proposed method can be used for single-channel source separation (SCSS) applications. In NMF based SCSS, NMF is used to decompose the spectra of the observed mixed signal as a weighted linear combination of a set of trained basis vectors. In this work, the NMF decomposition weights are enforced to consider statistical and temporal prior information on the weight combination patterns that the trained basis vectors can jointly receive for each source in the observed mixed signal. The Hidden Markov Model (HMM) is used as a log-normalized gains (weights) prior model for the NMF solution. The normalization makes the prior models energy independent. HMM is used as a rich model that characterizes the statistics of sequential data. The NMF solutions for the weights are encouraged to increase the log-likelihood with the trained gain prior HMMs while reducing the NMF reconstruction error at the same time

    Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation

    Get PDF
    We propose to use minimum mean squared error (MMSE) estimates to enhance the signals that are separated by nonnegative matrix factorization (NMF). In single channel source separation (SCSS), NMF is used to train a set of basis vectors for each source from their training spectrograms. Then NMF is used to decompose the mixed signal spectrogram as a weighted linear combination of the trained basis vectors from which estimates of each corresponding source can be obtained. In this work, we deal with the spectrogram of each separated signal as a 2D distorted signal that needs to be restored. A multiplicative distortion model is assumed where the logarithm of the true signal distribution is modeled with a Gaussian mixture model (GMM) and the distortion is modeled as having a log-normal distribution. The parameters of the GMM are learned from training data whereas the distortion parameters are learned online from each separated signal. The initial source estimates are improved and replaced with their MMSE estimates under this new probabilistic framework. The experimental results show that using the proposed MMSE estimation technique as a post enhancement after NMF improves the quality of the separated signal

    Filler model based confidence measures for spoken dialogue systems: a case study for Turkish

    Get PDF
    Because of the inadequate performance of speech recognition systems, an accurate confidence scoring mechanism should be employed to understand user requests correctly. To determine a confidence score for a hypothesis, certain confidence features are combined. The performance of filler-model based confidence features have been investigated. Five types of filler model networks were defined: triphone-network; phone-network; phone-class network; 5-state catch-all model; 3-state catch-all model. First, all models were evaluated in a Turkish speech recognition task in terms of their ability to tag correctly (recognition-error or correct) recognition hypotheses. The best performance was obtained from the triphone recognition network. Then, the performances of reliable combinations of these models were investigated and it was observed that certain combinations of filler models could significantly improve the accuracy of the confidence annotatio

    Single channel speech-music separation using matching pursuit and spectral masks

    Get PDF
    A single-channel speech music separation algorithm based on matching pursuit (MP) with multiple dictionaries and spectral masks is proposed in this work. A training data for speech and music signals is used to build two sets of magnitude spectral vectors of each source signal. These vectors’ sets are called dictionaries, and the vectors are called atoms. Matching pursuit is used to sparsely decompose the magnitude spectrum of the observed mixed signal as a nonnegative weighted linear combination of the best atoms in the two dictionaries that match the mixed signal structure. The weighted sum of the resulting decomposition terms that include atoms from the speech dictionary is used as an initial estimate of the speech signal contribution in the mixed signal, and the weighted sum of the remaining terms for the music signal contribution. The initial estimate of each source is used to build a spectral mask that is used to reconstruct the source signals. Experimental results show that integrating MP with spectral mask gives good separation results

    Single channel speech music separation using nonnegative matrix factorization with sliding windows and spectral masks

    Get PDF
    A single channel speech-music separation algorithm based on nonnegative matrix factorization (NMF) with sliding windows and spectral masks is proposed in this work. We train a set of basis vectors for each source signal using NMF in the magnitude spectral domain. Rather than forming the columns of the matrices to be decomposed by NMF of a single spectral frame, we build them with multiple spectral frames stacked in one column. After observing the mixed signal, NMF is used to decompose its magnitude spectra into a weighted linear combination of the trained basis vectors for both sources. An initial spectrogram estimate for each source is found, and a spectral mask is built using these initial estimates. This mask is used to weight the mixed signal spectrogram to find the contributions of each source signal in the mixed signal. The method is shown to perform better than the conventional NMF approach

    Using local temporal features of bounding boxes for walking/running classification

    Get PDF
    For intelligent surveillance, one of the major tasks to achieve is to recognize activities present in the scene of interest. Human subjects are the most important elements in a surveillance system and it is crucial to classify human actions. In this paper, we tackle the problem of classifying human actions as running or walking in videos. We propose using local temporal features extracted from rectangular boxes that surround the subject of interest in each frame. We test the system using a database of hand-labeled walking and running videos. Our experiments yield a low 2.5% classification error rate using period-based features and the local speed computed using a range of frames around the current frame. Shorter range time-derivative features are not very useful since they are highly variable. Our results show that the system is able to correctly recognize running or walking activities despite differences in appearance and clothing of subjects

    Güvenilir biyometrik kıyım yöntemi (Trustworthy biometric hashing method)

    Get PDF
    In this paper, we propose a novel biometric hashing method. We employ a password-generated random projection matrix applied to the face images directly instead of applying to the features extracted from face images and improve the methods in the literature. We aim to preserve privacy while achieving desirable accuracy in a biometric verification system. We do the verification in the hash domain and ensure irreversibility. In addition, we can get a new hash value by only changing the password which ensures cancelable biometrics property. We achieve zero equal error rate (EER) on Carnegie Mellon University face database. Furthermore, we achieve an EER of 0.0061, even if the attackers compromise the password and the random number generator. Besides, we test robustness of the proposed system against possible degradations due to sensor and environment inperfections. The norm of error is below optimum threshold obtained at EER for all distortions
    corecore